Multi-resolution cepstral features for phoneme recognition across speech sub-bands
نویسندگان
چکیده
Multi-resolution sub-band cepstral features strive to exploit discriminative cues in localised regions of the spectral domain by supplementing the full bandwith cepstral features with subband cepstral features derived from several levels of sub-band decomposition. Mult-iresolution feature vectors, formed by concatenation of the subband cepstral features into an extended feature vector, are shown to yield better performance than conventional MFCCs for phoneme recognition on the TIMIT database. Possible strategies for the recombination of partial recognition scores from independent multi-resoltuion sub-band models are explored. By exploiting the sub-band variations in signal to noise ratio for linearly weighted recombination of the log likelihood probabilities we obtained improved phoneme recognition performance in broadband noise compared to MFCC features. This is an advantage over a purely sub-band approach using non linear recombination which is robust only to narrow band noise.
منابع مشابه
Sub-banded reconstructed phase spaces for speech recognition
A novel method combining filter banks and reconstructed phase spaces is proposed for the modeling and classification of speech. Reconstructed phase spaces, which are based on dynamical systems theory, have advantages over spectral-based analysis methods in that they can capture nonlinear or higher-order statistics. Recent work has shown that the natural measure of a reconstructed phase space ca...
متن کاملCompact Speech Features Based on Wavelet Transform and Pca with Application to Speaker Identification
The main goal of this paper is to find some effective methods to improve the performance of speaker identification system. In speaker identification, we use wavelet transform to decompose the speech signals into several frequency bands and then use cepstral coefficients to capture the individualities of vocal track within the interested bands based on the acoustic characteristic of human ear. I...
متن کاملSome Applications of a Priori Knowledge in Multi-stream Hmm and Hmm/ann Based Asr
Multi-band ASR was largely inspired by the extremely high level of redundancy in the spectral signal representation which can be inferred from Fletcher’s product-oferrors rule for human speech perception. Indeed, the main aim of the multi-band approach is to exploit this redundancy in order to overcome the problem of data mismatch (while making no assumptions about noise type) by focusing recog...
متن کاملA sub-band-based feature reconstruction approach for robust speaker recognition
Although the field of automatic speaker or speech recognition has been extensively studied over the past decades, the lack of robustness has remained a major challenge. The missing data technique (MDT) is a promising approach. However, its performance depends on the correlation across frequency bands. This paper presents a new reconstruction method for feature enhancement based on the trait. In...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کامل